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Abstract 

Given a large social or computer network, how can we 
visualize it, find patterns, outliers, communities? Al¬ 
though several graph visualization tools exist, they can¬ 
not handle large graphs with hundred thousand nodes 
and possibly million edges. Such graphs bring two chal¬ 
lenges: interactive visualization demands prohibitive 
processing power and, even if we could interactively up¬ 
date the visualization, the user would be overwhelmed 
by the excessive number of graphical items. To cope with 
this problem, we propose a formal innovation on the 
use of graph hierarchies that leads to GMine system. 
GMine promotes scalability using a hierarchy of graph 
partitions, promotes concomitant presentation for the 
graph hierarchy and for the original graph, and extends 
analytical possibilities with the integration of the graph 
partitions in an interactive environment. 

1. Introduction 

Up-to-date applications have produced graphs on the 
order of hundred thousand nodes and possibly mil¬ 
lion edges (referenced from here on as large graphs). 
Large graphs are found in numerous real-life settings: 
web graphs (web pages pointing to others with hyper¬ 
text links), computer communication graphs (IP ad¬ 
dresses sending packets to other IP addresses), rec¬ 
ommendation systems, who-trusts-whom networks, bi¬ 
partite graphs of web-logs (who visits which page), to 
name a few. For such domains, efficient graph visualiza¬ 
tion becomes prohibitive due to the excessive process¬ 
ing power requirements that prevent interaction. Be¬ 
sides that, hundred-thousand-node drawings result in 
unintelligible cluttered images that do not aid to the 
user’s cognition. 

To face these challenges, former works (section [2| 


propose to present large graphs based on a hierarchy of 
graph partitions. However, these efforts fail on the task 
of integrating the groups of nodes that constitute the 
levels of the hierarchy. In these propositions, the graph 
hierarchy is “dead” and cannot answer questions such 
as What is the relation between a given group of nodes 
and another group of nodes? How many edges connect 
these two groups? Which are they? Which are the graph 
nodes from other groups that connect to a graph node 
of interest? These questions translate to the possibil¬ 
ity of using the original graph information concomitant 
to its hierarchical version. In such scenario, it is pos¬ 
sible to benefit from both structurings in parallel or 
in cooperation. The main contribution of this work is 
the delineation of a system that can answer these ques¬ 
tions dynamically and present them visually. 

We review related works in section [2| in section [3] 
we present basic concepts. Section [4] introduces new 
definitions for hierarchies of graph partitions and sec¬ 
tion [5] explains how to use these concepts in a suitable 
data structure. Section [6] clarifies the construction of 
the data structure that supports our system and sec¬ 
tion [7] presents the experiments. Section [3] concludes 
the paper. 

2. Related Work 

In the literature, there are several works that deal with 
the problem of visualizing large graphs. Munzner [6] 
proposes the H3 system, which deals with visual over¬ 
load issues by using a specific spanning tree, and man¬ 
ages the scalability with an innovative dynamic hy¬ 
perbolic layout. Different from this work, the system 
is based on a single resolution visual exploration and 
therefore has limited scalability features. Schaffer et al 
[8] compare full-zoom navigation techniques and the 
fisheye view for drawing clustered graphs. Walshaw 
and Cross [9 work on the issue of hierarchically parti- 
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tioning a graph. While Eades and Feng |3] (multilevel 

layout) and Frishman and Tal [4] (dynamic layout), 
propose algorithms for determining the layout of clus¬ 
tered graphs. These works, though, have not consid¬ 
ered the complete information of the groups of edges 
between the graph partitions and, instead of embed¬ 
ding the original graph in a supporting structure, the 
graph is lost if not kept in a parallel structure. Scala¬ 
bility is not considered in these works. These charac¬ 
teristics seriously limit their propositions, which lack 
interaction and data retrieval tasks. 

Papadopoulos et al [7] propose to draw a graph 
based on the graph modular decomposition theory [ 1 |. 
Their work explores a recursive tree-like partition of a 
graph to draw different levels of a graph modular hier¬ 
archy. Their work is not a complete system, but a de¬ 
scription of how to arrange the modules of a graph 
from different hierarchical levels. Interaction details 
are omitted. Eades [T also benefits from the recursive 
tree-like partition of a graph. His work proposes sin¬ 
gle resolution planar drawings that reflect the underly¬ 
ing structure of a clustered graph. His main motivation 
is improved aesthetics and not scalability. 

Our system is based on any kind of hierarchy of 
graph partitions, which can be manually determined 
by the analyst or, for large graphs, can be automati¬ 
cally determined by a proper methodology. In our ex¬ 
periments, we apply a methodology named k-way parti¬ 
tioning. That is, given a graph G— (V,£) with |V| nodes 
and \E\ edges, we want to have k subsets Vi,V 2 , •••,V& of 
V, such that ViDVj = 0 for i 7 ^ 7 , \Vi\ = n/k and U kVi = V. 
Also, the partitioning must minimize the number of 
edges of E whose incident vertices belong to different 
subsets. This partitioning methodology is described by 
Karypis and Kumar [5]. 

3. Basic Terminology 

In this work, a hierarchy of graph partitions is called 
a SuperGraph. The underlying data beneath a Su¬ 
per Graph is a Graph G= {V,£}, but a SuperGraph 
presents a different abstracting structure. It bene¬ 
fits from the fact that the entities and the relationships 
of the graph G can be grouped according to the re¬ 
lationships that they define. In a SuperGraph, 
each of these groups of nodes is treated as a sub¬ 
graph. This concept allows to work with a graph 
as a set of partitions hierarchically defined. Fol¬ 
lowing, we define the constituents of a SuperGraph 
together with an illustrative example given in fig¬ 
ure [T] 

Graph and SuperGraph 

Given a finite undirected graph G = {V,£}, with 


no loops nor multiple edges, a SuperGraph is a set 

G = {V, V/,£}. More specifically, a SuperGraph is com¬ 
posed of a set V of SuperNodes v, a set V/ of Leaf- 
SuperNodes vj and a set E of SuperEdges e. Follow¬ 
ing we define LeafSuperNode, SuperNode and Su- 
perEdge. 

LeafSuperNodes and SuperNodes 

Given a subset FcV, a LeafSuperNode vj is the sub¬ 
graph G\V' induced by V' . That is, G|V' = vj = G' = 
E' C E. The set of LeafSuperNodes is totaly 
disjoint, that is: 

fK = ®> for WieVt and V[ evfi = K(1) 

The union of the nodes of all the LeafSuperNodes of 
a SuperGraph equals to the set of nodes V. This fact 
is illustrated in the list of SuperNodes in figure [l] and 
defined as follows: 

IK = V, for WitV, and V[ evfi = KKl (2) 

A SuperNode v is defined as a set of SuperNodes v 
or, exclusively, it is defined as a set of LeafSuperNodes 
vj. Plus a set of SuperEdges e. As follows: 

V = K = {v^,vr,...,v-—= {ey\{v t ,v]} C V}} 

OR 

V = {V{ = {vj^,vn, ...,v /( |7|_!),£' = {efk\m e v{}} 

(3) 

Where the SuperEdge concept, £, is defined fur¬ 
ther in this section. 

Closure of a SuperNode 

In a SuperGraph, the closure of a SuperNode, or Leaf¬ 
SuperNode, v is the set of all the graph nodes v G V 
that, ultimately, belong to SuperNode v. That is, given 
a SuperNode v = {V',#'}, the closure of v is given by 
the recursive definition: 

(V', if v = {V',E'\ is a LeafSuperNode 
Closurey) = < _ 

I U Closure(vi), forvi G v, otherwise 

For example, in figure [l] we have Closure(yf) = 
ClosureiyjT) OClosureiyjd) = {5,6} U {7,8} = {5,6,7, 8 }. 
Also in the graph of figure [l] Closure (yd) = V. This 
last equality holds for any SuperGraph. The clo¬ 
sure of a SuperNode corresponds to the nodes that 
comprehend its community. Accordingly, at the low¬ 
est level of the tree (at the leaves) a community 
is a subgraph. At the highest level of the tree (at 
the root) the community is the entire graph. In¬ 
tuitively, we refer to the parent of a SuperNode w 
as Parent(w) = v if w G V f ,V f G v' = {V f ,E f }. We re¬ 
fer to the set of parents of a SuperNode w as the set 
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4 EXTENDING THE SUPERGRAPH CONCEPT 


4.1 SuperNodes Connectivity 

Parent siyv) = {v\w G Closureiy ) and v G V}. 

SuperEdges 

A SuperEdge ej] corresponds to the SuperEdge for Su¬ 
perNodes v~i and vj. This SuperEdge holds the edges 
e G E that connect graph nodes from SuperNode y to 
graph nodes from SuperNode vj. A SuperEdge ejT cor¬ 
responds to the SuperEdge for LeafSuperNode v/&. This 
SuperEdge holds the edges that interconnect graph 
nodes in LeafSuperNode v/& and corresponds to E' k . 
That is, ejT = E' k for vjl = Also, for a Su¬ 

perEdge e , weight(e) = \e\] for an edge e = {v p ,v^}, 
source(e ) = v p and target (e) = v q (although we are as¬ 
suming undirected graphs). Formally, a SuperEdge is 
defined as follows: 


eij = {e\e&E, 

source(e) G Closureiy i) and target(e) G Closureiy j)} 

(5) 

The union of the SuperEdges of all the SuperNodes 
together with the union of the SuperEdges of all the 
LeafSuperNodes equals to the set of edges £, as fol¬ 
lows: 

(0J«y) U («))=£, 

( 6 ) 


for {vi,vj} C V and V[ k G V/ 


4. Extending the SuperGraph concept 

The SuperGraph concept is a succinct representa¬ 
tion for a hierarchy of graph partitions. However, hi¬ 
erarchies of graph partitions do not hold the original 
graph structure, which is inevitably lost when the hi¬ 
erarchical representation is used. In this section we 
define further concepts in order to extend the possi¬ 
bilities of a SuperGraph. Our aim is to answer the 
questions raised in section [l] by dynamically restor¬ 
ing the original graph information. 


Graph G 



G={ 

V = {1,2,3,4,5,6,7,8}, 

E = {(1>2), (3,4), (5,6), (7,8), (2,3), 
(2,4),(5,7),(6,8),(1,5),(1,7),(4,7)} 

} 


SuperGraph G 

“Vt 


Ur 


■a; 


V__ 


G ={_ 

Y_= {vjhyyviih _ 

Yl = ( v i3Ti4Ti5Ti6} ? 

E {^ 1 , 2 ? ^ 3 , 4 ? ^ 5 , 6 ? ^ 3 , 3 > 

} - SuperNodes - 


^ = {(1,5),(1,7),(4,7)} 
eJA = {(2,3), (2,4)} 

^6 = {(5,7),(6,8)} 

^3,3 = {(l, 2 )} ^4,4 = {(3,4)} 

g 5,5 = {(5,6)} ^6,6 = {(7,8)} 


-SuperEdges- 


V0 = {{v j r,V2},{^r2}} 
Vl = {{V 3 ,V 4 },{^ 4 }} 
V2 = {{V5,V6},{^6}} 
LeafSuperNodes - 

m = {{i,2},{^3}} 
m = {{3,4},{eil}} 

V/5 = {{5,6},{e^5}} 
^={{7,8},{^}} 


- Internal Edges 

InternalEdges(vjg) = {(\,2)}lnternalEdges(vy) = {(5,6)} 
InternalEdgesiyjf) = {(3,4)}lnternalEdges(vjf) — {(7,8)} 
InterncilEdges(vi) = {(1,2), (3,4), (2,3), (2,4)} 
InternalEdges{vf) = {(5,6), (7,8), (5,7), (6,8)} 
InternalEdgesiy o) = {InternalEdges(y\)JnternalEdges(y 2 ) 
(1,5), (1,7), (4,7)} =E 


- External Edges 

= {(2,3), (2,4), (1,7), (1,5)} 
= {(3,2), (4,2), (4,7)} 

= {(6,8),(5,7),(5,1)} 

= {(8,6),(7,5),(7,1),(7,4)} 
{(1,5),(1,7),(4,7)} 
{((5,1),(7,1),(7,4)} 

IL 


ExternalEdges(VjS) 
ExternalEdgesiyp) 
ExternalEdgesiyjg ) 
ExternalEdgesiylg ) 
ExternalEdgestyx) = 
ExternalEdgestyf) = 
ExternalEdgesjvo) — 


Open Nodes - 

, 2} OpenNodes(yi ) = {1,4} 

4} OpenNodeslyf) — {5,7} 

,6}OpenNodes(vo) — {} 

_ 


OpenNodesiyN ) = {1 
OpenNodestylp = {3 
OpenNodesiyig ) = {5 
OpenNodesiylg ) = {7 


Figure 1. Example of a Graph and the respec¬ 
tive SuperGraph. 


fine the set of all the open nodes of a SuperNode v as 
OpenNodesiy). 


Definition 1: given a SuperNode, or LeafSuperN¬ 
ode, v, an edge e is called an internal edge of v if 
source(e) G Closureiy ) and target(e) G Closureiy). For 
this situation, we say that “edge e can be resolved 
within the closure of v”. We define the set of all the in¬ 
ternal edges of a SuperNode v as InternalEdgesiy ). 

Definition 2: an edge e is called an external edge of v 
if source(e) G Closureiy) and target(e) 0 Closureiy). Ac¬ 
cordingly, we say that “edge e cannot be resolved 
within the closure of v”. We define the set of all the ex¬ 
ternal edges of a SuperNode v as ExternalEdges(v). 

Definition 3: a graph node v, v G Closureiy ), is an 
open node of v if there exists an external edge e , 
e G ExternalEdgesiv ), so that source(e) = v. We de- 


In the next subsections we explain how to use these 
definitions in order to extend the information that a 
SuperGraph can provide. 

4.1. SuperNodes Connectivity 

We refer to the connections between groups of nodes 
in a graph hierarchy as connectivity. Formally, the con¬ 
nectivity corresponds to equation [5] According to the 
SuperGraph formalization, the connectivity for sibling 
communities is readily available as part of the Super- 
Graph, at its SuperEdges. For communities that are 
not siblings, or that are at different levels of the hier¬ 
archy, the connectivity must be traced. 

The challenge here is how to trace the connectivity 
between arbitrary SuperNodes without having to cross 
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4.3 Integration to a data structure 
the information of the hierarchy of partitions with the 
information of the underlying graph (not available). In¬ 
stead, we are looking for a more scalable and efficient 
(viable) procedure for large graphs. In order to per¬ 
form this task we use the open nodes information. 

The open nodes information specifies all the nodes 
of a given SuperNode that connect to nodes from 
other SuperNodes. 

Theorem 1: given any two SuperNodes vj 
and v], the set of all possible edges connect¬ 
ing p to v/ is given by the Cartesian product 
OpenN odes(vj ) X OpenN odes(y~j). 


_ 5 GRAPH-TREE STRUCTURE 

erarchy. That is, the original graph is lost because the 
external edges information is not kept. 

In a SuperGraph, with the aid of the open nodes in¬ 
formation, we can determine the complete set of exter¬ 
nal edges relative to any graph node. 

From equation [5] and definition 3, follows 
that given a graph node v, for any SuperNode 
v|v £ OpenN odes(y ), there is one or more edges 
e = {v,w}\e £ e and e £ Parents(v). 

Theorem 3: if a graph node v is an open node for a Su¬ 
perNode v, then the set of parents Parents(y ) have all 
the SuperEdges that hold edges connected to v. 


From equationsj3] and [5] follows that given a Su¬ 
perNode vj = {V 7 ,!?'}, for any pair of sibling SuperN¬ 
odes \yg,Vh} C V', vj is the unique SuperNode 

that contains the edges connecting any pair of Su¬ 
perNodes { vj,v ]}, {vj,v]} C {Closure(v^)xClosure(Vh)}. 

Theorem 2: the set of edges that actually con¬ 
nect any two SuperNodes vj and vj is a subset of the 
unique SuperEdge e^j, which satisfies vj £ Closure{Vg) 
and v] £ Closure(vjj)\ {y^,vT} C vj. 

Intuitively, vj is the first common parent of vj and 
D- 

To determine the set of edges that connect any two 
SuperNodes vj and vj, we have to compute the inter¬ 
section between the set of all possible nodes between 
vj and v] (theorem 1 ) and the set that contains the ac¬ 
tual edges between vj and vj (theorem 2). That is: 

{OpenNodes(vj) X 
OpenN odes (v])} 

Connect ivity(vj,vj) = { fl } (7) 

{eghl^i E Closure (v^), 
v] £ Closure(vjj)} 

The SuperNode connectivity tells the relation be¬ 
tween any pair of SuperNodes in a way that is possi¬ 
ble to determine the number and which, exactly, are 
the graph nodes that determine the connectivity. This 
possibility extends the analysis for graph partitions be¬ 
cause the SuperNodes are inspected either as sole en¬ 
tities or as groups of entities descending from the un¬ 
derlying graph. 

4.2. Graph Nodes Connectivity 

A graph hierarchy uses the relationships among the 
graph nodes in order to define groups of related graph 
nodes. But the relationships between graph nodes at 
different groups of nodes are not part of the graph hi- 


Thus, if we know the set of parents and the set of 
open nodes of a SuperNode, we can determine the ex¬ 
ternal edges of any graph node v £ OpenN odes(y). To 
do so, a reference to the first parent SuperNode at each 
SuperNode is enough to define an incremental recur¬ 
sive procedure that can trace the external edges of any 
graph node of interest. Hence, while the graph node 
of interest is in the set of open nodes of the current 
parent SuperNode, there are still external edges to be 
traced. We just have to proceed upward in the hierar- 
chy. 

The graph node connectivity restores the original 
graph relationships dynamically. This way, in a million 
edges visualization, the user is guided across the hier¬ 
archy of partitions and allowed to inspect a particular 
node, instead of being overwhelmed by the huge vol¬ 
ume of data. 


4.3. Integration to a data structure 


The SuperGraph abstraction and the open nodes infor¬ 
mation define a novel structure model. This model pro¬ 
vides a computational representation suitable to per¬ 
form the operations defined in sections 4.1 and 4.2 In 
the next sections we illustrate the data structure used 
to implement this model. We explain how to build it 
at the same time that we gather the necessary infor¬ 
mation from the underlying graph. 


5. Graph-Tree Structure 

The Graph-Tree structure is intended to store and 
manage a SuperGraph. Since a SuperGraph is also a 
graph, the Graph-Tree is a new structure for graphs. 
Different from classic graph structures as adjacency 
matrices and lists of adjacencies, the Graph-Tree man¬ 
ages a graph according to a hierarchy of communities- 
within-communities. We explore this approach for large 
graph processing and visualization. To do so, the 
Graph-Tree is composed of SuperNodes that are sets 
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6 BUILDING A GRAPH-TREE FROM A GRAPH 


of SuperNodes, and LeafSuperNodes that are sets of 
nodes. The later ones hold references to files stor¬ 
ing subgraph information, one file per LeafSuperNode. 
These subgraphs are loaded to (expand LeafSuperN¬ 
ode task) and released from (collapse LeafSuperNode 
task) memory just when necessary, allowing for com- 
partmented processing and presentation. 



•-- 


v i31 | £3.31 v i4| | £4.41 V I5| | £5.51 v i6| | £fi.6 


® ® © 

◄ —Stored in files 


Figure 2. Graph-Tree example. 


For illustration, in figure [2] we present the Su¬ 
per Graph of figure [l] stored in a GraphTree. No¬ 
tice how the tree adjusts to the SuperNodes reflecting 
a hierarchical arrangement. For example, SuperN- 
ode V 2 = {v/ 5 ,v/ 6 } becomes parent of SuperNodes 
v /5 and v/6. The SuperEdges are stored in the Su- 
perNodes’ parent that holds references to the respec¬ 
tive SuperNodes. For example, SuperNode vf keeps 
the references of LeafSuperNodes vfs and v/6, con¬ 
sequently, it holds SuperEdge eNff. At the bottom 
of the tree we have subgraphs and their respec¬ 
tive graph nodes and edges. 

Components of the structure 

To hold a SuperGraph, the Graph-Tree uses five sub 
structures to represent the concepts introduced in 
section [3j open node (openNode ), edge (edge), Su¬ 
perEdge ( sEdge ), LeafSuperNodes (INode) and Su¬ 
perNode (sNode). The first one, openNode is an alias 
for a node id, it refers to a node from a given com¬ 
munity. The edge structure is used to abstract a rela¬ 
tion (edge) between two nodes. The sEdge structure is 
used to abstract a set of edges between two SuperN¬ 
odes. The INode structure represents a LeafSuperNode 
and the sNode structure represents a SuperNode. Fig¬ 
ure [3] details and exemplifies each of these structures. 

6. Building a Graph-Tree from a graph 

In this section we describe how to build a Graph-Tree 
departing from a graph. We illustrate all the steps 
in order to explain the process and to clarify the 
Graph-Tree structure, its arrangement and man¬ 
aged information. 

Hierarchy construction 

Given a graph G={V,E}, we recursively apply the 
k-way partitioning (section [ 2 ]). We perform a se¬ 
quence of recursive partitionings to achieve a hier- 


•edge[sNId,dNId\ 


•sEdge[sSNId,dSNId, 

array[0...(#edges — 1 )] of |£dge|] 


iopenNode[nodeId] 




* \edge |4|7 | 



Leaf 
Super 
Node 1 


Leaf 
Super 
Node 2 


1 


sEdge 

1 2 

l \edge A 

\ 9 || edge 

7 8 


• lNode[id , filePath , Pt r Parent ^ 

array[0...(#nodes — 1)] of \nodeIdType \, sEdge , 
array[0...(#openNodes — 1)] of \openNode\]] 

£ ' =* 


• sNode[id ,Ptr.Parent , 

array[0...(#sons— 1)] of \sNode OR INode |, 
array[0...(#sEdges — 1)] of \sEdge |, 
array[ 0... (#openNodes — 1)] of \openNode\]] 



,eaf 
Super 
.Node 1 



Leaf 
Super / 
Node2/ 
SuperNod e^—^ 


INode ] 

[ "..\Subgraphl.txt" 

3 [4] [7]" 

7] 

sEdge 

1 

1 

| edge 

4 

T||e4H 4 l 

1 

open!\ 

r O(l 

'es 

00 




sNode 

3 

5 

INode 

1 

... 

INode 

2 

... 

sEdge 

1 

2|, 

zdge 4 

T\\edge\l |8 | 

open Nodes \I\\9\\ 


S uperNode 5r 


Figure 3. Graph-Tree components. 


archy of communities-within-communities. At each 
recursion, each partition is submitted to a k-way par¬ 
titioning cycle that will create another set of k par¬ 
titions. These partitions are propagated to the 
next level of the tree and the process repeats un¬ 
til we get the desired number of h hierarchy lev¬ 
els. For each new set of partitions, a new subtree is 
embedded in the Graph-Tree structure and the ref¬ 
erences for the graph nodes are kept at the bot¬ 
tom level of the tree. 

Filling the Graph-Tree SuperNodes 

After building the tree hierarchy based on the recur¬ 
sive partitioning, it is necessary to fill the SuperNodes 
of the tree with the SuperEdge and open nodes informa¬ 
tion. In algorithm [TJ we benefit from the tree structure 
to recursively scan the levels of the tree in a bottom-up 
fashion. Initially the LeafSuperNodes are filled with in¬ 
formation from the subgraphs produced by the parti¬ 
tioning procedure. Then, we proceed to upper levels 
where the SuperNodes use the external edges informa- 
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7 EXPERIMENTS WITH GMINE 


tion propagated from lower levels._ 

Algorithm 1 Algorithm to fill a Graph-Tree. 

Require: Ptr: pointer to the root of the Graph-Tree 
1: FillGraphTree(Ptr) 

2: if Ptr is leaf then 

3: Load subgraph file pointed by Ptr->filePath. 

4: Instantiate and fill the SuperEdge array of edges and the 

array of son nodes for Ptr. 

5: end if 
6: else 

7: for each son Si of Ptr do 

8: FillGraphTree{si) /*Recursively down the hierarchy*/ 

9: end for 

10: Instantiate a SuperEdge for each pair of sons. 

11: Use the external edges information to look for cross 

references between sons. 

12: Store resolved edges in the SuperEdges. 

13: end else 

14: Use external edges to determine Ptr’s array of open nodes. 
15: Propagate external edges information to parent. 

Figure [4] illustrates this process. We start with 
graph G, which is partitioned to create the graph-tree 
with empty SuperNodes (see figures ia), gb) and 
Qc)). The bottom-up recursive process starts at the 
leaves, illustrated in figure Qd). For this illustra¬ 
tion, and for figure |4je), matches between exter¬ 
nal edges are indicated in boldface and gray external 
edges indicate unresolved external edges. Under¬ 
lined nodes ids indicate open nodes and the diago¬ 
nal arrows depict the external edges propagated up the 
tree. Still in figure [4^ d), it is possible to see the infor¬ 
mation propagated from nodes v /3 and v/ 4 , which will 
be used in step 11 of algorithm [l] to find matches be¬ 
tween unresolved external edges. Illustrated in figure 
Qe), the crossing of the propagated information re¬ 
sults in matches (2,3) — (3,2) and (2,4) — (4,2), stored 
in SuperEdge TyU Figure EFe) also shows the first Su¬ 
perEdges among siblings ( e 3,4 and and another 
information propagation way up the tree. Fig- 
ure[4jf) shows the last SuperEdge storing the last set 
of edges between siblings. Figure [4^g) shows the end 
of the process when no information is left for process¬ 
ing. 

7. Experiments with GMine 

GMine implements the partitioning of a graph and 
manages this partitioning via integrated compart¬ 
ments. To do so, we use the Graph-Tree structure of¬ 
fering a set of interactivity tasks to visually mine a 
SuperGraph. Following, we illustrate the functionali¬ 
ties of GMine utilizing two datasets. Due to space lim¬ 
itations it is not possible to show all the GMine func¬ 
tionalities. Therefore, we have GMine available on¬ 
line at http://www.cs.cmu.edu/~junio/GMine, 
where the software, datasets and videos can be down¬ 
loaded. 



Figure 4. Graph-Tree filling illustration. From 
(a) to (c), hierarchical partitioning and empty 
Graph-Tree creation. From (d) to (g), illustra¬ 
tion of the algorithm used to fill the Graph- 
Tree. 


Email-net dataset 

The first dataset, which is intentionally small, defines a 
semantic-rich partitioning that was manually set in or¬ 
der to introduce the cognitive characteristics of GMine. 
It is comprised of 81 nodes and 341 edges. Each node 
represents an employee that belongs to a distinct com¬ 
pany department. In the first level, the employees are 
grouped according to their department and in the sec¬ 
ond level according to their company, see figure [5ja). 
Each undirected edge of the graph represents electronic 
messages transmitted between two nodes, the weights 
indicate the number of messages. The visual interpreta¬ 
tion of this graph aims at presenting the interrelation¬ 
ship between the individuals, the departments and/or 
the companies. This interrelationship is depicted by the 
number of messages exchanged between the entities of 
the SuperGraph. 

We first illustrate relationships between SuperN¬ 
odes in figure [5|a). In this illustration we present Su¬ 
perEdges among companies and SuperEdges among 
departments of the same company. Using equa¬ 
tion [7] and under user demand, we can calculate the 
relationship between departments of different com- 
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7 EXPERIMENTS WITH GMINE 



Figure 5. GMine visual mining of a synthetic 
dataset. 


parries, highlighted in figure [5^b) . On top of the 
Graph-Tree structure, GMine system can also track 
the specific nodes (individuals) that exchange mes¬ 
sages between two communities. On a double click 
event, the system presents these nodes (color differ¬ 
entiated) as a detailed bipartite subgraph in a sep¬ 
arate window, highlighted in figure |5|c). In GMine, 
each subgraph can be processed totally indepen¬ 
dent of the rest of the visualization, including a 
set of graph processing tasks (sampling, partition¬ 
ing, force-directed or page-rank based layouts), graph 
metrical calculations (degree distribution, compo¬ 
nents summary, hops) and rich interaction. Figure 
[5|d) shows how GMine permits to dig down the Su- 
perGraph hierarchy and explore a specific community 
of nodes as a separate subgraph. It is possible to in¬ 
teract with a community subgraph in parallel to other 
community subgraphs, all in the context of the Super- 
Graph being visualized. 

DBLP dataset 

The second dataset originates from the Digital Bibliog¬ 


raphy & Library Project (or DBLP). DBLP is a pub¬ 
licly database of publication data that embraces au¬ 
thors from the Computer Science community and their 
published works, it is available at http://dblp.uni- 
trier.de/. The DBLP dataset version that we use de¬ 
fines a graph with 315,688 nodes and 1,659,853 edges, 
where each node represents an author from this com¬ 
munity and each edge denotes a co-authoring relation¬ 
ship. In our experiments, we used GMine to automat¬ 
ically create a recursive partitioning of DBLP dataset. 
The partitioning has 5 hierarchy levels each of them 
with 5 partitions. The dataset, thus, is broken into 
5 4 + l, or 626, communities with an average of 500 
nodes per community. The communities reflect the con¬ 
nectivity among their members according to the k-way 
partitioning that, for this dataset, generates communi¬ 
ties oriented to highly collaborative authors and con- 



Figure 6. (a) Overview of DBLP dataset, (b) 
Focus on community s34. (c) Inspection of out¬ 
lier. 


Figure 6 presents an overview of DBLP dataset. In 
figure 6(a), it is possible to see DBLP partitioned into 
5 communities in its first hierarchy level, and other 
5*5, or 25 communities in its second hierarchy level. At 
this point, 3 communities are highly connected to ev¬ 
ery other community and also highly connected among 
their 5 sub communities. The other 2 first level com¬ 
munities are relatively isolated from the other 3 and 
totally isolated among their sub communities. One can 
conclude that the 3 highly connected communities hold 
long term collaborating authors, while the other 2 hold 
casual, less productive authors who seldom interact 
with each other. In figure 6(b) we focus on commu¬ 
nity s034 and verify that its sub communities are iso¬ 
lated from each other. A deeper focus in community 
s034 in figure 6(c) shows that among its sub commu¬ 
nities (highlighted), only two of them present an edge. 
Our system allows to inspect this specific outlier edge 
to reveal that authors “D. B. Miller” and “R. G. Stock- 
ton” define this co-authoring relation for their unique 
DBLP publication dated from 1989. 

Figure [7] presents a sequence of interactive actions 
performed by the user when navigating in DBLP 
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Figure 7. GMine visual mining of DBLP. (a) 
Query label for author Bin Wei. From (b) to 
(d), going down the graph hierarchy to find au¬ 
thor Bin Wei. (e) Zoom on the subgraph of in¬ 
terest. (f) and (g), retrieval of external neigh¬ 
bors for the author of interest. 


dataset. Initialy, we perform a label query for author 
“Bin Wei”. In figures [7^ b) andj^c), we illustrate the 
animation performed by GMine in order to show the 
graph node of interest. On the left-hand of the illustra¬ 
tions it is possible to see the level tracker indicating, at 
each step, the level of the hierarchy where the focus is. 
In figure [7Jd) we reach the deepest level, where the sub¬ 
graphs of the LeafSuperNodes are. Figure [7^e) zooms 
the direct relationships of author Bin Wei, which define 
a small community related to research on mobile com¬ 
puting authors. In figure [7^f) we use theorem 3 in or¬ 
der to retrieve the external neighbors for our sample 
author and get the list exhibited in figure [7^g) . This 
list of authors indicate other communities where Bin 
Wei has research interest, including scientific visualiza¬ 
tion and distributed visualization. 

8. Conclusions 

We have presented GMine, a system for large graph vi¬ 
sualization based on a hierarchy of graph partitions. We 
have covered the details to achieve our system by delin¬ 
eating and extending the SuperGraph concept and by 
introducing the Graph-Tree structure. We also demon¬ 


strated GMine using two datasets. In the experiments 
GMine was able to process and present different par¬ 
titions of each dataset allowing targeted presentation 
under user’s demand. The contribution of our work in¬ 
clude scalability via partitioned processing and presen¬ 
tation of large graphs; extended analysis of a hierarchy 
of graph partitions by the integration of its parts in an 
interactive environment; and, most important, the pos¬ 
sibility of concomitant functionalities for the hierarchy 
of graph partitions and the original graph. 

9. Acknowledgements 

This work was partly supported by CAPES (Brazil¬ 
ian Committee for Graduate Studies), FAPESP (Sao 
Paulo State Research Foundation), CNPq (Brazilian 
National Research Foundation) and the National Sci¬ 
ence Foundation under Grants IIS-0209107, SENSOR- 
0329549 and IIS-0534205. This work was also partly 
supported by the Pennsylvania Infrastructure Technol¬ 
ogy Alliance (PITA) and by donations from Intel, NTT 
and Hewlett-Packard. Any opinions, findings, and con¬ 
clusions or recommendations expressed in this mate¬ 
rial are those of the authors and do not necessarily re¬ 
flect the views of the National Science Foundation, or 
other funding parties. 

References 

[1] E. Dahlhaus, J. Gustedt, and R. M. McConnell. Ef¬ 
ficient and practical algorithms for sequential modu¬ 
lar decomposition. Journal of Algorithms , 41:360-387, 
2001. 

[2] Peter Eades. Drawing clustered graphs on an orthogo¬ 
nal grid. Journal of Graph Algorithms and Applications , 
3(4):3-29, 1999. 

[3] Peter Eades and Qing-Wen Feng. Multilevel visualiza¬ 
tion of clustered graphs. In Graph Drawing 1996 , LNCS 
1190, pages 101-112. Springer-Verlag. 

[4] Yaniv Frishman and Ayellet Tal. Dynamic drawing of 
clustered graphs. In Infovis 2004 > pages 191-198. 

[5] George Karypis and Vipin Kumar. Multilevel graph 
partitioning schemes. In IEEE/ACM Int. Conference 
on Parallel Processing 1995 , pages 113-122. 

[6] T. Munzner. Exploring large graphs in 3d hyperbolic 
space. IEEE Computer Graphics and Applications , 
18(4): 18—23, 1998. 

[7] C. Papadopoulos and C. Voglis. Drawing graphs using 
modular decomposition. In Graph Drawing 2005 , LNCS 
3843, pages 343-354. Springer-Verlag. 

[8] D. Schaffer, Z. Zuo, S. Greenberg, L. Bartram, J. Dill, 
S. Dubs, and M. Roseman. Navigating hierarchically 
clustered networks through fisheye and full-zoom meth¬ 
ods. ACM Transactions on Computer-Human Interac¬ 
tion, 3(2):162-188, 1996. 

[9] C. Walshaw and M. Cross. Mesh partitioning: a multi¬ 
level balancing and refinement algorithm. SIAM J. Sci. 
Comput ., 22(l):63-80, 2000. 


Proceedings of the IEEE International Symposium on Multimedia, 2006 

Copyright IEEE 


234 















































